This lab is meant to be run as a jupyter notebook, you
could download it here (the zip-file contains the
notebook and the database, and a .html-file
which you can read in case you have problems reading/running jupyter
notebooks).
To start your notebook, you can use the following commands (after
you’ve downloaded the zip file – observe that the
$ is the shell prompt, you shouldn’t type it):
$ unzip lab1.zip
$ jupyter notebook # or jupyter labThis should start jupyter in a browser tab, and there you can click “lab1.ipynb”.
We have a database to handle the academic achievements of students at LTH – in it we have four tables:
students:
|
|
|---|---|
ssn
|
social security number |
first_name
|
first name (obviously!) |
last_name
|
last name |
departments:
|
|
|---|---|
department_code
|
unique code for each department |
department_name
|
the name of the department, in Swedish |
courses:
|
|
|---|---|
course_code
|
course code, like EDAF75
|
course_name
|
the name of the course, in Swedish (like “Databasteknik”) |
department_code
|
the department giving the course |
level
|
the course level, like “G1”, “G2”, or “A” |
credits
|
the number of credits for the course, like 7.5 |
finished_courses:
|
|
|---|---|
ssn
|
the social security number of a student |
course_code
|
the course code for the course the student has taken |
grade
|
the grade for the student passing the course |
Some sample data:
ssn first_name last_name
--- ---------- ---------
861103–2438 Bo Ek
911212–1746 Eva Alm
950829–1848 Anna Nyström
... ... ...
department_code department_name
--------------- ----------------------------------------
eda Datavetenskap
edi Informationsteori
eem Elektrisk mätteknik
eie Industriell elektroteknik och automation
... ...
course_code course_name department_code level credits
----------- ---------------------------------- --------------- ----- -------
ETTN05 Adaptiv signalbehandling eit A 7.5
FMAN10 Algebraiska strukturer fma A 7.5
ETIN80 Algoritmer i signalprocessorer... eit A 7.5
EDAF05 Algoritmer, datastrukturer och... eda G2 5.0
... ... ... ... ...
ssn course_code grade
--- ----------- -----
861103–2438 EDA016 4
861103–2438 EDAA01 3
911212–1746 EDA016 3
... ... ...
The tables have been created with the following SQL statements:
CREATE TABLE students (
ssn CHAR(11),
first_name TEXT NOT NULL,
last_name TEXT NOT NULL,
PRIMARY KEY (ssn)
);
CREATE TABLE departments (
department_code TEXT,
department_name TEXT,
PRIMARY KEY (department_code)
);
CREATE TABLE courses (
course_code CHAR(6),
course_name TEXT NOT NULL,
department_code TEXT,
level CHAR(2),
credits DOUBLE NOT NULL CHECK (credits >= 0),
PRIMARY KEY (course_code),
FOREIGN KEY (department_code) REFERENCES departments(department_code)
);
CREATE TABLE finished_courses (
ssn CHAR(11),
course_code CHAR(6),
grade INTEGER NOT NULL CHECK (grade >= 3 AND grade <= 5),
PRIMARY KEY (ssn, course_code),
FOREIGN KEY (ssn) REFERENCES students(ssn),
FOREIGN KEY (course_code) REFERENCES courses(course_code)
);All courses offered at the “Computer Science and Engineering” program at LTH during the academic year 2013/14 are in the table ’courses`. Also, the database has been filled with made up data. SQL statements like the following have been used to insert the data:
INTO students(ssn, first_name, last_name)
VALUES ('950705-2308', 'Anna', 'Johansson'),
('930702-3582', 'Anna', 'Johansson'),
('911212-1746', 'Eva', 'Alm'),
('910707-3787', 'Eva', 'Nilsson'),
...The information about which departments gives which courses is somewhat haphazard (some of the departments in the database no longer exist, and some of them may never have existed). Our database assumes each course is given by a single department, in real life, departments can share courses – if you want to practice what we’re talking about in week 2 and week 3, try to come up with a modification of the database design, which allows departments to share courses (you can ask your TA about it during the lab session).
To pass this lab, you only need to show your solutions to the problems marked REVIEW below – that’s the only ones we’ll be looking at during the lab sessions. You can skip the other problems at your own discretion, but I strongly recommend that you try them out, for practice.
As said above, this lab is designed to be run as a jupyter notebook (either using =jupyter lab=, or =jupyter notebook=). If you haven’t been able to get =jupyter= up and running on your computer, you can run the sql-commands for the problems below interactively inside =sqlite3=, or as a script file with all your commands, or inside some IDE, like DB Browser for SQLite.
If you’re running the lab as a notebook (and hopefully you are!), evaluate the following cells before you begin:
%load_ext sql%sql sqlite:///lab1.sqlite| Jump to REVIEW problems |
The tables students, departments,
courses and finished_courses are already in
your database, you can see some of their contents by running the cells
below:
%%sql
SELECT *
FROM students
LIMIT 4%%sql
SELECT *
FROM departments
LIMIT 4%%sql
SELECT *
FROM courses
LIMIT 4%%sql
SELECT *
FROM finished_courses
LIMIT 4If you inadvertently change the contents of the tables, you can always recreate the them with the following command (it must be run at the command line):
$ sqlite3 lab1.sqlite < lab1-setup.sqlWhat are the names (first name and last name) of all the students?
%%sqlNow sort the names, first by last name and then by first name:
%%sqlWhen you get it to work, experiment by listing only the 10 first students (alphabetically), then try to list only students 11-20, etc.:
%%sqlWhat are the names of the students who were born in 1985?
%%sqlHint: the substr function can be useful (it also goes by
the name substring).
The penultimate digit in the social security number is even for females, and odd for males. List the first names of all female students in our database alphabetically.
%%sqlNow try to output each name only once (so, no duplicates).
%%sqlHow many students are registered in the database?
%%sqlHow many male students are there?
%%sqlNow try to output the number of distinct names in the listing of
female students above, using count. It turns out that this
is a bit tricky, we need to make sure we use the word
DISTINCT in the right place (look carefully in the documentation).
%%sqlIn the next few queries, we’ll look at the results of the student
with the social security number 910101-1234 – to make things a
lot easier, start by creating a VIEW with all his
results.
%%sql
DROP VIEW IF EXISTS ...;
CREATE VIEW ... ASMake sure the view contains all data pertinent to the student in question (it will make the following queries very simple).
Which courses (course codes only) have been taken by the student?
%%sqlWhat are the names of these courses, and how many credits do they give?
%%sqlHow many credits has the student taken?
%%sqlWhat is the student’s grade average? It turns out that there are actually (at least) two different averages at play here:
the unweighted average, i.e., just the average of all the students grades (no matter how many credits each course gives), and
the weighted average, where we use the credits for a course as a weight.
First the unweighted average:
%%sqlAnd then the weighted average (feel free to ask me about this during QA sessions, if you get stuck):
%%sqlHint: If you’ve created a proper view above, we’ll get a ‘table’ with
one row for each course the student has passed, and each row will
contain information about grades and credits for the passed course. If
we use arithmetic operations in a select statement, and then use an
aggregate function around that operation, we’ll apply the aggregate
function to each value the operation returns (so, e.g., a
sum over a product will be a scalar product).
Now drop the view:
%%sqlHow many courses are there for each level (G1,
G2, and A)?
%%sqlFor each level, how many courses give more than 7.5 HP – list only those categories with more than 5 such courses?
%%sqlFor the five departments which offers the most total credits (for its courses in this database) – output the name of the department, and the total number of offered credits:
%%sqlWhich students (ssn and full name) have taken 0 credits?
This problem can be solved in several ways, first do it with an outer
join:
%%sqlNow do the same thing using a subquery:
%%sqlList the names and average grades of the 10 students with the highest grade average? You can use the unweighted average.
%%sqlList the social security number and total number of credits for all
students – order by total credits, descending. Students with no credits
should be included in the listing, with 0 credits (not
NULL).
Use an outer JOIN to solve the problem – you might want
to use the function coalesce(v1, v2, ...); it returns the
first value which is not NULL, so
coalesce(avg(grade), 0) would give 0 if the were no grades
(i.e., if grade were NULL), you can also try
the ifnull function.
%%sqlDo all students have unique names (first name and last name)? If not, show the full name and social security number for all students who have a namesake.
As usual there are several ways of solving this, solve it using a
WITH-statement where you create a ‘table’ with all
duplicate names, and then:
Use a JOIN:
%%sqlUse a subquery:
%%sql